Serveur d'exploration sur SGML

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Mining millions of metaphors

Identifieur interne : 000357 ( Main/Exploration ); précédent : 000356; suivant : 000358

Mining millions of metaphors

Auteurs : Brad Pasanek [États-Unis] ; D. Sculley [États-Unis]

Source :

RBID : ISTEX:DB439E09F0F5EA1C7B2555FC52451FEDD4A07E3E

Abstract

One of the first decisions made in any research concerns the selection of an appropriate scale of analysis—are we looking out into the heavens, or down into atoms? To conceive a digital library as a collection of a million books may restrict analysis to only one level of granularity. In this article, we examine the consequences and opportunities resulting from a shift in scale, where the desired unit of interpretation is something smaller than a text: it is a keyword, a motif, or a metaphor. A million books distilled into a billion meaningful components become raw material for a history of language, literature, and thought that has never before been possible. While books herded into genres and organized by period remain irregular, idiosyncratic, and meaningful in only the most shifting and context-dependent ways, keywords or metaphors are lowest common denominators. At the semantic level—the level of words, images, and metaphors—long-term regularity and patterns emerge in collection, analysis, and taxonomy. This article follows the foregoing course of thought through three stages: first, the manual curation of a high quality database of metaphors; second, the expansion of this database through automated and human-assisted techniques; finally, the description of future experiments and opportunities for the application of machine learning, data mining, and natural language processing techniques to help find patterns and meaning concealed at this important level of granularity.

Url:
DOI: 10.1093/llc/fqn010


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Mining millions of metaphors</title>
<author wicri:is="90%">
<name sortKey="Pasanek, Brad" sort="Pasanek, Brad" uniqKey="Pasanek B" first="Brad" last="Pasanek">Brad Pasanek</name>
</author>
<author wicri:is="90%">
<name sortKey="Sculley, D" sort="Sculley, D" uniqKey="Sculley D" first="D." last="Sculley">D. Sculley</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:DB439E09F0F5EA1C7B2555FC52451FEDD4A07E3E</idno>
<date when="2008" year="2008">2008</date>
<idno type="doi">10.1093/llc/fqn010</idno>
<idno type="url">https://api.istex.fr/ark:/67375/HXZ-5HM0XZ7M-R/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">003881</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">003881</idno>
<idno type="wicri:Area/Istex/Curation">002C57</idno>
<idno type="wicri:Area/Istex/Checkpoint">000314</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000314</idno>
<idno type="wicri:doubleKey">0268-1145:2008:Pasanek B:mining:millions:of</idno>
<idno type="wicri:Area/Main/Merge">000359</idno>
<idno type="wicri:Area/Main/Curation">000357</idno>
<idno type="wicri:Area/Main/Exploration">000357</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a">Mining millions of metaphors</title>
<author wicri:is="90%">
<name sortKey="Pasanek, Brad" sort="Pasanek, Brad" uniqKey="Pasanek B" first="Brad" last="Pasanek">Brad Pasanek</name>
<affiliation wicri:level="1">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of Virginia, Charlottesville</wicri:regionArea>
<wicri:noRegion>Charlottesville</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Tufts University, Medford</wicri:regionArea>
<wicri:noRegion>Medford</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author wicri:is="90%">
<name sortKey="Sculley, D" sort="Sculley, D" uniqKey="Sculley D" first="D." last="Sculley">D. Sculley</name>
<affiliation wicri:level="1">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of Virginia, Charlottesville</wicri:regionArea>
<wicri:noRegion>Charlottesville</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Tufts University, Medford</wicri:regionArea>
<wicri:noRegion>Medford</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Literary and Linguistic Computing</title>
<idno type="ISSN">0268-1145</idno>
<idno type="eISSN">1477-4615</idno>
<imprint>
<publisher>Oxford University Press</publisher>
<date type="published" when="2008-09">2008-09</date>
<biblScope unit="volume">23</biblScope>
<biblScope unit="issue">3</biblScope>
<biblScope unit="page" from="345">345</biblScope>
<biblScope unit="page" to="360">360</biblScope>
</imprint>
<idno type="ISSN">0268-1145</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0268-1145</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract">One of the first decisions made in any research concerns the selection of an appropriate scale of analysis—are we looking out into the heavens, or down into atoms? To conceive a digital library as a collection of a million books may restrict analysis to only one level of granularity. In this article, we examine the consequences and opportunities resulting from a shift in scale, where the desired unit of interpretation is something smaller than a text: it is a keyword, a motif, or a metaphor. A million books distilled into a billion meaningful components become raw material for a history of language, literature, and thought that has never before been possible. While books herded into genres and organized by period remain irregular, idiosyncratic, and meaningful in only the most shifting and context-dependent ways, keywords or metaphors are lowest common denominators. At the semantic level—the level of words, images, and metaphors—long-term regularity and patterns emerge in collection, analysis, and taxonomy. This article follows the foregoing course of thought through three stages: first, the manual curation of a high quality database of metaphors; second, the expansion of this database through automated and human-assisted techniques; finally, the description of future experiments and opportunities for the application of machine learning, data mining, and natural language processing techniques to help find patterns and meaning concealed at this important level of granularity.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
</list>
<tree>
<country name="États-Unis">
<noRegion>
<name sortKey="Pasanek, Brad" sort="Pasanek, Brad" uniqKey="Pasanek B" first="Brad" last="Pasanek">Brad Pasanek</name>
</noRegion>
<name sortKey="Pasanek, Brad" sort="Pasanek, Brad" uniqKey="Pasanek B" first="Brad" last="Pasanek">Brad Pasanek</name>
<name sortKey="Pasanek, Brad" sort="Pasanek, Brad" uniqKey="Pasanek B" first="Brad" last="Pasanek">Brad Pasanek</name>
<name sortKey="Sculley, D" sort="Sculley, D" uniqKey="Sculley D" first="D." last="Sculley">D. Sculley</name>
<name sortKey="Sculley, D" sort="Sculley, D" uniqKey="Sculley D" first="D." last="Sculley">D. Sculley</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Informatique/explor/SgmlV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000357 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000357 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Informatique
   |area=    SgmlV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:DB439E09F0F5EA1C7B2555FC52451FEDD4A07E3E
   |texte=   Mining millions of metaphors
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jul 1 14:26:08 2019. Site generation: Wed Apr 28 21:40:44 2021